lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Programming reference.html (12991B)


      1 
      2 				<!DOCTYPE html>
      3 				<html>
      4 					<head>
      5 						<meta charset="UTF-8">
      6 						<link rel="stylesheet" href="pluginAssets/highlight.js/atom-one-light.css">
      7 						<title>Programming reference</title>
      8 					<link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head>
      9 					<body>
     10 
     11 <div id="rendered-md"><h1 id="numpy-matplotlib">Numpy &amp; matplotlib</h1>
     12 <p>Load external file:</p>
     13 <pre class="hljs"><code>data = numpy.loadtxt(<span class="hljs-string">'./filepath.csv'</span>, delimiter=<span class="hljs-string">','</span>)
     14 </code></pre>
     15 <p>Print information about data:</p>
     16 <pre class="hljs"><code>data.shape
     17 </code></pre>
     18 <p>Graph two columns of data:</p>
     19 <pre class="hljs"><code><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
     20 %matplotlib inline
     21 x = data[:,<span class="hljs-number">0</span>]
     22 y = data[:,<span class="hljs-number">1</span>]
     23 <span class="hljs-comment"># includes size and transparency setting, specifies third column to use for color</span>
     24 plt.scatter(x, y, s=<span class="hljs-number">3</span>, alpha=<span class="hljs-number">0.2</span>, c=data[:,<span class="hljs-number">2</span>], cmap=<span class="hljs-string">'RdYlBu_r'</span>)
     25 plt.xlabel(<span class="hljs-string">'x axis'</span>)
     26 plt.ylabel(<span class="hljs-string">'y axis'</span>);
     27 </code></pre>
     28 <p>Histogram plotting:</p>
     29 <pre class="hljs"><code><span class="hljs-comment"># bins determines width of bars</span>
     30 plt.hist(data, bins=<span class="hljs-number">100</span>, range=[start, end]
     31 </code></pre>
     32 <p>The identity matrix:</p>
     33 <pre class="hljs"><code>np.eye(<span class="hljs-number">2</span>) <span class="hljs-comment"># for a 2x2 matrix</span>
     34 </code></pre>
     35 <p>Matrix multiplication:</p>
     36 <pre class="hljs"><code>a * b       <span class="hljs-comment"># element-wise</span>
     37 a.dot(b)    <span class="hljs-comment"># dot product</span>
     38 </code></pre>
     39 <p>Useful references:</p>
     40 <ul>
     41 <li><a data-from-md  title='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' href='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' type=''>The official numpy quickstart guide</a></li>
     42 <li><a data-from-md  title='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' href='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' type=''>A more in-depth tutorial, with in-browser samples</a></li>
     43 <li><a data-from-md  title='http://cs231n.github.io/python-numpy-tutorial/' href='http://cs231n.github.io/python-numpy-tutorial/' type=''>A very good walk through the most important functions and features</a>. From the famous <a data-from-md  title='http://cs231n.github.io/' href='http://cs231n.github.io/' type=''>CS231n course</a>, from Stanford.</li>
     44 <li><a data-from-md  title='https://matplotlib.org/users/pyplot_tutorial.html' href='https://matplotlib.org/users/pyplot_tutorial.html' type=''>The official pyplot tutorial</a>. Note that pyplot can accept basic python lists as well as numpy data.</li>
     45 <li><a data-from-md  title='https://matplotlib.org/gallery.html' href='https://matplotlib.org/gallery.html' type=''>A gallery of example MPL plots</a>. Most of these do not use the pyplot state-machine interface, but the more low level objects like <a data-from-md  title='https://matplotlib.org/api/axes_api.html' href='https://matplotlib.org/api/axes_api.html' type=''>Axes</a>.</li>
     46 <li><a data-from-md  title='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' href='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' type=''>In-depth walk through the main features and plot types</a></li>
     47 </ul>
     48 <h1 id="sklearn">Sklearn</h1>
     49 <p>Split data into train and test, on features <code class="inline-code">x</code> and target <code class="inline-code">y</code>:</p>
     50 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
     51 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>)
     52 </code></pre>
     53 <p>An estimator implements method <code class="inline-code">fit(x,y)</code> that learns from data, and <code class="inline-code">predict(T)</code> which takes new instance and predicts target value.</p>
     54 <p>Linear classifier, using SVC model with linear kernel:</p>
     55 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC
     56 linear = SVC(kernel=<span class="hljs-string">'linear'</span>)
     57 linear.fit(x_train, y_train)
     58 </code></pre>
     59 <p>Decision tree classifier:</p>
     60 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeClassifier
     61 tree = DecisionTreeClassifier()
     62 tree.fit(x_train, y_train)
     63 </code></pre>
     64 <p>k-Nearest Neighbors:</p>
     65 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsClassifier
     66 knn = KNeighborsClassifier(<span class="hljs-number">15</span>) <span class="hljs-comment"># We set the number of neighbors to 15</span>
     67 knn.fit(x_train, y_train)
     68 </code></pre>
     69 <p>Try to classify new data:</p>
     70 <pre class="hljs"><code>linear.predict(some_data)
     71 </code></pre>
     72 <p>Compute accuracy on testing data:</p>
     73 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
     74 y_predicted = linear.predict(x_test)
     75 accuracy_score(y_test, y_predicted)
     76 </code></pre>
     77 <p>Make a plot of classification, with colors showing classifier's decision:</p>
     78 <pre class="hljs"><code><span class="hljs-keyword">from</span> mlxtend.plotting <span class="hljs-keyword">import</span> plot_decision_regions
     79 plot_decision_regions(x_test[:<span class="hljs-number">500</span>], y_test.astype(np.integer)[:<span class="hljs-number">500</span>], clf=linear, res=<span class="hljs-number">0.1</span>);
     80 </code></pre>
     81 <p>Compare classifiers via ROC curve:</p>
     82 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_curve, auc
     83 
     84 <span class="hljs-comment"># The linear classifier doesn't produce class probabilities by default. We'll retrain it for probabilities.</span>
     85 linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>)
     86 linear.fit(x_train, y_train)
     87 
     88 <span class="hljs-comment"># We'll need class probabilities from each of the classifiers</span>
     89 y_linear = linear.predict_proba(x_test)
     90 y_tree  = tree.predict_proba(x_test)
     91 y_knn   = knn.predict_proba(x_test)
     92 
     93 <span class="hljs-comment"># Compute the points on the curve</span>
     94 <span class="hljs-comment"># We pass the probability of the second class (KIA) as the y_score</span>
     95 curve_linear = sklearn.metrics.roc_curve(y_test, y_linear[:, <span class="hljs-number">1</span>])
     96 curve_tree   = sklearn.metrics.roc_curve(y_test, y_tree[:, <span class="hljs-number">1</span>])
     97 curve_knn    = sklearn.metrics.roc_curve(y_test, y_knn[:, <span class="hljs-number">1</span>])
     98 
     99 <span class="hljs-comment"># Compute Area Under the Curve</span>
    100 auc_linear = auc(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>])
    101 auc_tree   = auc(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>])
    102 auc_knn    = auc(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>])
    103 
    104 plt.plot(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>], label=<span class="hljs-string">'linear (area = %0.2f)'</span> % auc_linear)
    105 plt.plot(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>], label=<span class="hljs-string">'tree (area = %0.2f)'</span> % auc_tree)
    106 plt.plot(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>], label=<span class="hljs-string">'knn (area = %0.2f)'</span>% auc_knn)
    107 
    108 plt.xlim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>])
    109 plt.ylim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>])
    110 plt.xlabel(<span class="hljs-string">'False Positive Rate'</span>)
    111 plt.ylabel(<span class="hljs-string">'True Positive Rate'</span>)
    112 plt.title(<span class="hljs-string">'ROC curve'</span>);
    113 
    114 plt.legend();
    115 </code></pre>
    116 <p>Cross-validation:</p>
    117 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_score
    118 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_auc_score, make_scorer
    119 
    120 <span class="hljs-comment"># The cross_val_score function does all the training for us. We simply pass</span>
    121 <span class="hljs-comment"># it the complete data, the model, and the metric.</span>
    122 
    123 linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>)
    124 
    125 <span class="hljs-comment"># Train for 5 folds, returing ROC AUC. You can also try 'accuracy' as a scorer</span>
    126 scores = cross_val_score(linear, x, y, cv=<span class="hljs-number">3</span>, scoring=<span class="hljs-string">'roc_auc'</span>)
    127 
    128 print(<span class="hljs-string">'scores per fold '</span>, scores)
    129 </code></pre>
    130 <p>Regression:</p>
    131 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> datasets
    132 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error, r2_score
    133 
    134 <span class="hljs-comment"># Load the diabetes dataset, and select one feature (Body Mass Index)</span>
    135 x, y = datasets.load_diabetes(<span class="hljs-literal">True</span>)
    136 x = x[:, <span class="hljs-number">2</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
    137 
    138 <span class="hljs-comment"># -- the reshape operation ensures that x still has two dimensions</span>
    139 <span class="hljs-comment"># (that is, we need it to be an n by 1 matrix, not a vector)</span>
    140 
    141 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>)
    142 
    143 <span class="hljs-comment"># feature space on horizontal axis, output space on vertical axis</span>
    144 plt.scatter(x_train[:, <span class="hljs-number">0</span>], y_train)
    145 plt.xlabel(<span class="hljs-string">'BMI'</span>)
    146 plt.ylabel(<span class="hljs-string">'disease progression'</span>);
    147 
    148 <span class="hljs-comment"># Train three models: linear regression, tree regression, knn regression</span>
    149 <span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LinearRegression
    150 linear = LinearRegression()
    151 linear.fit(x_train, y_train)
    152 
    153 <span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor
    154 tree = DecisionTreeRegressor()
    155 tree.fit(x_train, y_train)
    156 
    157 <span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsRegressor
    158 knn = KNeighborsRegressor(<span class="hljs-number">10</span>)
    159 knn.fit(x_train, y_train);
    160 
    161 <span class="hljs-comment"># Plot the models</span>
    162 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error
    163 
    164 plt.scatter(x_train, y_train, alpha=<span class="hljs-number">0.1</span>)
    165 
    166 xlin = np.linspace(<span class="hljs-number">-0.10</span>, <span class="hljs-number">0.2</span>, <span class="hljs-number">500</span>).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>)
    167 plt.plot(xlin, linear.predict(xlin), label=<span class="hljs-string">'linear'</span>)
    168 plt.plot(xlin, tree.predict(xlin), label=<span class="hljs-string">'tree '</span>)
    169 plt.plot(xlin, knn.predict(xlin), label=<span class="hljs-string">'knn '</span>)
    170 
    171 print(<span class="hljs-string">'MSE linear '</span>, mean_squared_error(y_test, linear.predict(x_test)))
    172 print(<span class="hljs-string">'MSE tree '</span>, mean_squared_error(y_test, tree.predict(x_test)))
    173 print(<span class="hljs-string">'MSE knn'</span>, mean_squared_error(y_test, knn.predict(x_test)))
    174 
    175 plt.legend();
    176 </code></pre>
    177 <p>Useful references:</p>
    178 <ul>
    179 <li><a data-from-md  title='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' href='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' type=''>The official quickstart guide</a></li>
    180 <li><a data-from-md  title='https://www.datacamp.com/community/tutorials/machine-learning-python' href='https://www.datacamp.com/community/tutorials/machine-learning-python' type=''>A DataCamp tutorial with interactive exercises</a></li>
    181 <li><a data-from-md  title='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' href='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' type=''>Analyzing text data with SKLearn</a></li>
    182 </ul>
    183 </div></div>
    184 					</body>
    185 				</html>